API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

we introduce API-Bank, a groundbreaking benchmark, specifically designed for tool-augmented LLMs (Abstract)

3つのopen question

(1) How effective are current LLMs in utilizing tools?

we develop a runnable evaluation system consisting of 73 API tools.

We annotate 314 tool-use dialogues with 753 API calls to assess the existing LLMs' capabilities in planning, retrieving, and calling APIs.

(2) How can we enhance LLMs' ability to utilize tools?

we construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains.

このデータセットでAlpacaからLynxを訓練した

(3) What obstacles need to be overcome to leverage tools?

future research （Lynxのエラー分析から）